204 research outputs found
Career Outcomes of International Master\u27s Recipients from Chinese Institutions: A Study of Students From Three ASEAN States
As the third largest destination country for international postsecondary students, China has received nearly 500,000 international students, and more than 20% of them are from ASEAN member states (Department of International Cooperation and Exchanges, 2019). Compared to students from Western society, most ASEAN students are from developing countries and may have stronger needs to generate career benefits via studying abroad. ASEAN students in China and their career outcomes, however, have been always overlooked in existing research.
In this qualitative study, I applied Human Capital Theory (HCT) and Neo Racism Theory (NRT) to investigate the career outcomes of graduated ASEAN students who obtain a master’s degree of Chinese Language from mainland China. I conducted in-depth, semi-structured interview with 16 participants who were born in Malaysia, Myanmar, and Thailand, investigating their perceptions on the benefits and costs of studying in China, factors impacting their career outcomes, and suggestions on Chinese government and universities. I also explored how participants’ experience and perceptions vary across sending countries.
Participants recognized that studying in China can improve their employability by enhancing their technical skills, language skills, and soft skills. Establishing professional networks, holding a master’s degree granted by Chinese universities, and learning from the workplace culture in China can also contribute to their professional development in both China and their home countries.
Based on participants’ perceptions, the influential factors for career outcomes can be categorized into international/national, social/institutional, and personal/family factors. China-ASEAN economic cooperation has created opportunities from these participants who have
studied in China and know China well. China’s unclear policies on international students, however, have confused participants and caused barriers when they seek jobs in China. At the social level, some participants have experienced discrimination against non-White races, which discouraged them from remaining, but most participants were impressed by China’s development and wanted to work in China. Participants improved their employability via courses offered in their programs, and those who graduated from high reputation universities or universities that have cooperation with ASEAN states tended to obtain better career opportunities. Most Chinese universities, however, adopt a segregation policy, dividing Chinese and international students into different classes and dorms. Participants, therefore, lack opportunities to interact with local students and build local network. Moreover, many advisors in China were limited by their knowledge on ASEAN states and cannot offer necessary help on participants’ career development. At the personal and family level, personal experience is vital in jo-seeking, and family responsibility and parents’ expectations have pulled many participants back to their sending countries. Most participants had no suggestions for Chinese government and institutions, although some expected more fair scholarship policies and more clear immigration regulations.
The results partly echo HRT and NRT but challenged some arguments as well. This research remains scholars to be more cautious when applying West-originated theories in Asia, and factors like politics, culture, and economic development in the studied areas should be considered. This study also generated a model to show how influential factors interact with each other and impact participants’ career outcomes
Graph Summarization via Node Grouping: A Spectral Algorithm
Graph summarization via node grouping is a popular method to build concise graph representations by grouping nodes from the original graph into supernodes and encoding edges into superedges such that the loss of adjacency information is minimized. Such summaries have immense applications in large-scale graph analytics due to their small size and high query processing efficiency. In this paper, we reformulate the loss minimization problem for summarization into an equivalent integer maximization problem. By initially allowing relaxed (fractional) solutions for integer maximization, we analytically expose the underlying connections to the spectral properties of the adjacency matrix. Consequently, we design an algorithm called SpecSumm that consists of two phases. In the first phase, motivated by spectral graph theory, we apply k-means clustering on the k largest (in magnitude) eigenvectors of the adjacency matrix to assign nodes to supernodes. In the second phase, we propose a greedy heuristic that updates the initial assignment to further improve summary quality. Finally, via extensive experiments on 11 datasets, we show that SpecSumm efficiently produces high-quality summaries compared to state-of-the-art summarization algorithms and scales to graphs with millions of nodes.Peer reviewe
Streaming Algorithms for Diversity Maximization with Fairness Constraints
Diversity maximization is a fundamental problem with wide applications in
data summarization, web search, and recommender systems. Given a set of
elements, it asks to select a subset of elements with maximum
\emph{diversity}, as quantified by the dissimilarities among the elements in
. In this paper, we focus on the diversity maximization problem with
fairness constraints in the streaming setting. Specifically, we consider the
max-min diversity objective, which selects a subset that maximizes the
minimum distance (dissimilarity) between any pair of distinct elements within
it. Assuming that the set is partitioned into disjoint groups by some
sensitive attribute, e.g., sex or race, ensuring \emph{fairness} requires that
the selected subset contains elements from each group .
A streaming algorithm should process sequentially in one pass and return a
subset with maximum \emph{diversity} while guaranteeing the fairness
constraint. Although diversity maximization has been extensively studied, the
only known algorithms that can work with the max-min diversity objective and
fairness constraints are very inefficient for data streams. Since diversity
maximization is NP-hard in general, we propose two approximation algorithms for
fair diversity maximization in data streams, the first of which is
-approximate and specific for , where
, and the second of which achieves a
-approximation for an arbitrary . Experimental
results on real-world and synthetic datasets show that both algorithms provide
solutions of comparable quality to the state-of-the-art algorithms while
running several orders of magnitude faster in the streaming setting.Comment: 13 pages, 11 figures; published in ICDE 202
Spectral Normalized-Cut Graph Partitioning with Fairness Constraints
Normalized-cut graph partitioning aims to divide the set of nodes in a graph
into disjoint clusters to minimize the fraction of the total edges between
any cluster and all other clusters. In this paper, we consider a fair variant
of the partitioning problem wherein nodes are characterized by a categorical
sensitive attribute (e.g., gender or race) indicating membership to different
demographic groups. Our goal is to ensure that each group is approximately
proportionally represented in each cluster while minimizing the normalized cut
value. To resolve this problem, we propose a two-phase spectral algorithm
called FNM. In the first phase, we add an augmented Lagrangian term based on
our fairness criteria to the objective function for obtaining a fairer spectral
node embedding. Then, in the second phase, we design a rounding scheme to
produce clusters from the fair embedding that effectively trades off
fairness and partition quality. Through comprehensive experiments on nine
benchmark datasets, we demonstrate the superior performance of FNM compared
with three baseline methods.Comment: 17 pages, 7 figures, accepted to the 26th European Conference on
Artificial Intelligence (ECAI 2023
Fair and Representative Subset Selection from Data Streams
We study the problem of extracting a small subset of representative items from a large data stream. In many data mining and machine learning applications such as social network analysis and recommender systems, this problem can be formulated as maximizing a monotone submodular function subject to a cardinality constraint k. In this work, we consider the setting where data items in the stream belong to one of several disjoint groups and investigate the optimization problem with an additional fairness constraint that limits selection to a given number of items from each group. We then propose efficient algorithms for the fairness-aware variant of the streaming submodular maximization problem. In particular, we first give a (1/2-ε)-approximation algorithm that requires O((1/ε) log(k/ε)) passes over the stream for any constant ε>0. Moreover, we give a single-pass streaming algorithm that has the same approximation ratio of (1/2-ε) when unlimited buffer sizes and post-processing time are permitted, and discuss how to adapt it to more practical settings where the buffer sizes are bounded. Finally, we demonstrate the efficiency and effectiveness of our proposed algorithms on two real-world applications, namely maximum coverage on large graphs and personalized recommendation.Peer reviewe
Balancing Utility and Fairness in Submodular Maximization (Technical Report)
Submodular function maximization is central in numerous data science
applications, including data summarization, influence maximization, and
recommendation. In many of these problems, our goal is to find a solution that
maximizes the \emph{average} of the utilities for all users, each measured by a
monotone submodular function. When the population of users is composed of
several demographic groups, another critical problem is whether the utility is
fairly distributed across groups. In the context of submodular optimization, we
seek to improve the welfare of the \emph{least well-off} group, i.e., to
maximize the minimum utility for any group, to ensure fairness. Although the
\emph{utility} and \emph{fairness} objectives are both desirable, they might
contradict each other, and, to our knowledge, little attention has been paid to
optimizing them jointly. In this paper, we propose a novel problem called
\emph{Bicriteria Submodular Maximization} (BSM) to strike a balance between
utility and fairness. Specifically, it requires finding a fixed-size solution
to maximize the utility function, subject to the value of the fairness function
not being below a threshold. Since BSM is inapproximable within any constant
factor in general, we propose efficient data-dependent approximation algorithms
for BSM by converting it into other submodular optimization problems and
utilizing existing algorithms for the converted problems to obtain solutions to
BSM. Using real-world and synthetic datasets, we showcase applications of our
framework in three submodular maximization problems, namely maximum coverage,
influence maximization, and facility location.Comment: 13 pages, 7 figures, under revie
Towards an Instance-Optimal Z-Index
We present preliminary results on instance-optimal variants of the Z-index, a well-known spatial index that makes use of the Z-order curve. Unlike the base Z-index, the variants we propose aim to adapt to the data and range-query workloads of the given setting. Specifically, we provide an optimal algorithm that builds a Z-index that minimizes the expected number of retrieved data points for the given data and query workload. Moreover, since the optimal algorithm requires supra-linear running time, we additionally propose efficient heuristic algorithms to use in its place. Our experiments evaluate the performance of the resultant Z-indexes.Peer reviewe
Coresets for minimum enclosing balls over sliding windows
\emph{Coresets} are important tools to generate concise summaries of massive
datasets for approximate analysis. A coreset is a small subset of points
extracted from the original point set such that certain geometric properties
are preserved with provable guarantees. This paper investigates the problem of
maintaining a coreset to preserve the minimum enclosing ball (MEB) for a
sliding window of points that are continuously updated in a data stream.
Although the problem has been extensively studied in batch and append-only
streaming settings, no efficient sliding-window solution is available yet. In
this work, we first introduce an algorithm, called AOMEB, to build a coreset
for MEB in an append-only stream. AOMEB improves the practical performance of
the state-of-the-art algorithm while having the same approximation ratio.
Furthermore, using AOMEB as a building block, we propose two novel algorithms,
namely SWMEB and SWMEB+, to maintain coresets for MEB over the sliding window
with constant approximation ratios. The proposed algorithms also support
coresets for MEB in a reproducing kernel Hilbert space (RKHS). Finally,
extensive experiments on real-world and synthetic datasets demonstrate that
SWMEB and SWMEB+ achieve speedups of up to four orders of magnitude over the
state-of-the-art batch algorithm while providing coresets for MEB with rather
small errors compared to the optimal ones.Comment: 28 pages, 10 figures, to appear in The 25th ACM SIGKDD Conference on
Knowledge Discovery and Data Mining (KDD '19
- …